Hidden-Mode Markov Decision Processes

نویسندگان

Samuel P. M. Choi

Dit-Yan Yeung

Nevin L. Zhang

چکیده

Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang [email protected] [email protected] [email protected] Department of Computer Science, Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong Abstract Traditional reinforcement learning (RL) assumes that environment dynamics do not change over time (i.e., stationary). This assumption, however, is not realistic in many real-world applications. In this paper, a formal model for an interesting subclass of nonstationary environments is proposed. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always con ned to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. HM-MDP is a special case of partially observable Markov decision processes (POMDP). Nevertheless, modeling an HM-MDP environment via the more general POMDP model unnecessarily increases the problem complexity. In this paper the conversion from the former to the latter is discussed. Learning a model of HM-MDP is the rst step of two steps for nonstationary model-based RL to take place. This paper shows how model learning can be achieved by using a variant of the Baum-Welch algorithm. Compared with the POMDP approach, empirical results reveal that the HM-MDP approach signi cantly reduces computational time as well as the required data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving Hidden-Semi-Markov-Mode Markov Decision Problems

Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Processes (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chai...

متن کامل

Solving Hidden-Mode Markov Decision Problems

Hidden-Mode Markov decision processes (HM-MDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HM-MDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirect...

متن کامل

Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making

Nonstationary Sequential Decision Making Samuel P. M. Choi, Dit-Yan Yeung, and Nevin L. Zhang Department of Computer Science, Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong fpmchoi,dyyeung,[email protected]

متن کامل

An Environment Model for Nonstationary Reinforcement Learning

Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonsta-tionary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basic...

متن کامل